#install.packages("rstac")
#install.packages("sf")
#install.packages("gdalcubes")
#install.packages("terra")
#install.packages("viridis")Creating a raster cube from STAC items of the EOPF Zarr Samples Service using rstac and gdalcubes
Introduction
This notebook demonstrates how to create a raster data cube from STAC items from the EOPF Sentinel Zarr Samples Service using the rstac and gdalcubes packages in R. We will search for Sentinel-2 images roughly covering the Münsterland region in Germany for June 2025, and retrieve corresponding STAC items with rstac. We will then use gdalcubes to create a raster data cube and visualize an RGB plot aggregating the data to one image (using median) and compute the NDVI for that image.
The gdalcubes package facilitates the creation of regular raster cubes from irregular image collections with differing spatial and temporal resolutions, partial spatial overlap or differing projections etc. It first collects images in so-called image collections and creates data cubes from them, where the spatiotemporal geometry is defined via a data cube view. It also has a function to directly create image collections from STAC items, which we will use here. For details on the functionality and concepts, see the gdalcubes documentation and specifically Marius´ tutorial on the example of the Sentinel-2 COG catalog on Amazon Web Services (AWS).
What we will learn
- 🚀 Retrieve STAC items for Sentinel-2 imagery in a specified AOI (roughly “Münsterland” region, Germany) and time range (June 2025) using
rstac - 🔎 Create an image collection from STAC items using
stac_image_collection()ingdalcubes - 🛰️ Create raster data cubes with a user-defined spatiotemporal geometry using
cube_view()andraster_cube()ingdalcubesand perform a basic NDVI change analysis
As you will see, we will need some tweaking of the workflow here and there, since gdalcubes is currently not yet fully prepared for working with the EOPF Sentinel Zarr Samples Service, but the functionality is there and will hopefully be streamlined in future releases.
This now also works on Windows when using R >= 4.5.2, where an issue with reading blosc compressed files using gdal was fixed, see here
Prerequisites
Make sure to have R >=4.5.2. Install rstac, sf, terra, and gdalcubes. We will also use viridis for nice color maps, so we will install it .
Create image collection from STAC items
We can convert STAC items received through rstac to gdalcubes image collections using the function stac_image_collection(). From there on, we can create raster cubes using the usual workflow.
This notebook is only meant to showcase a simple example of the usage of gdalcubes with STAC items. It does not contain details on the concepts and workflow of creating and further processing raster cubes. For a more in-depth introduction, please refer to the gdalcubes documentation. This also contains a number of tutorials that explain the capabilities and potential applications of the package.
We start with a search for image data in the EOPF Zarr Samples Service. To interact with the EOPF STAC API, we will use rstac:
# Using a bbox roughly for "Münsterland" and the month June 2025.
# For the stac request, we need the bbox in WGS84
# We limit the no of results to 5 for testing purposes
library(rstac)
library(gdalcubes)
library(sf)Linking to GEOS 3.12.2, GDAL 3.11.3, PROJ 9.4.1; sf_use_s2() is TRUE
bbox_wgs84 = st_bbox(c(xmin = 6.55, xmax = 7.8, ymin = 51.8, ymax = 52.4), crs = st_crs(4326))
bbox_utm = st_transform(bbox_wgs84, "EPSG:32632")
stac_source = stac("https://stac.core.eopf.eodc.eu/")
items = stac_source |>
stac_search(collections = "sentinel-2-l2a",
datetime = "2025-06-01T00:00:00Z/2025-06-30T23:59:59Z",
bbox = c(bbox_wgs84["xmin"],bbox_wgs84["ymin"],
bbox_wgs84["xmax"],bbox_wgs84["ymax"]), limit = 5) |>
post_request()
length(items$features)[1] 5
items###Items
- features (5 item(s)):
- S2C_MSIL2A_20250630T104041_N0511_R008_T32UMD_20250630T161206
- S2C_MSIL2A_20250630T104041_N0511_R008_T32UMC_20250630T161206
- S2C_MSIL2A_20250630T104041_N0511_R008_T32ULD_20250630T161206
- S2C_MSIL2A_20250630T104041_N0511_R008_T32ULC_20250630T161206
- S2C_MSIL2A_20250630T104041_N0511_R008_T31UGU_20250630T161206
- assets:
AOT_10m, B01_20m, B02_10m, B03_10m, B04_10m, B05_20m, B06_20m, B07_20m, B08_10m, B09_60m, B11_20m, B12_20m, B8A_20m, product, product_metadata, SCL_20m, SR_10m, SR_20m, SR_60m, TCI_10m, WVP_10m
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type
###--> Helper function to adapt URLs (adding a prefix) to make them accessible for GDAL
f = function(href) {
return(paste0("ZARR:\"/vsicurl/", gsub(".zarr/", ".zarr\"/", href)))
}As you can see, we have received 5 STAC items matching our search criteria (we would have received more items, but we have set the limit to 5).
Why adapt the URLs?
The items returned from our STAC request have asset hrefs like this:
items$features[[1]]$assets$B02_10m$href [1] "https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202506-s02msil2a/30/products/cpm_v256/S2C_MSIL2A_20250630T104041_N0511_R008_T32UMD_20250630T161206.zarr/measurements/reflectance/r10m/b02"
In order for GDAL to use the correct driver, we need the URL with a prefix for the correct Zarr driver, see tutorial: https://eopf-sample-service.github.io/eopf-sample-notebooks/gdal-explore-zarr/ Also, we want it to point directly to the .zarr file. The function above replaces the href by one that links directly there and prepends the required prefix. Since you can give this function as an argument to stac_image_collection(), it will be applied to all assets automatically.
Now we can create an image collection from the received STAC items:
# Manually set argument "srs", because gdalcubes looks for proj:epsg, but in the collection there is only proj:code
# Also: since the items found are not all in the same UTM zone, we will filter them them first to have all images in one SRS )
s2_collection = stac_image_collection(items$features, srs = "EPSG:32632", asset_names = c("B02_10m", "B03_10m", "B04_10m", "B08_10m"), url_fun = f, property_filter = function(x) {x[["proj:code"]] == "EPSG:32632"})
s2_collectionImage collection object, referencing 4 images with 4 bands
Images:
name left
1 S2C_MSIL2A_20250630T104041_N0511_R008_T32UMD_20250630T161206 7.500940
2 S2C_MSIL2A_20250630T104041_N0511_R008_T32UMC_20250630T161206 7.531529
3 S2C_MSIL2A_20250630T104041_N0511_R008_T32ULD_20250630T161206 6.004761
4 S2C_MSIL2A_20250630T104041_N0511_R008_T32ULC_20250630T161206 6.065798
top bottom right datetime srs
1 53.24954 52.25345 9.146277 2025-06-30T10:40:41 EPSG:32632
2 52.35039 51.35443 9.143291 2025-06-30T10:40:41 EPSG:32632
3 53.24196 52.22621 7.678544 2025-06-30T10:40:41 EPSG:32632
4 52.34305 51.32805 7.704578 2025-06-30T10:40:41 EPSG:32632
Bands:
name offset scale unit nodata image_count
1 B02_10m 0 1 4
2 B03_10m 0 1 4
3 B04_10m 0 1 4
4 B08_10m 0 1 4
Currently, gdalcubes cannot read the SRS from the returned STAC items (due to the use of proj:code instead of proj:epsg metadata fields) and we have to define the spatial reference system (SRS) manually. However, this has been addressed in the latest gdalcubes version on GitHub, which is not yet available on CRAN. Once the SRS is detected automatically, gdalcubes can also handle different projections of images in the same image collection and reproject them on-the-fly when creating the raster cube. This will significantly simplify the workflow and allow to easily combine images from different UTM zones in one cube.
Create a cube view
After collecting all available images in an image collection, we will now create a raster data cube. For this, we first need to define a cube view, which specifies the desired spatio-temporal geometry of the raster cube (spatial resolution, extent, temporal resolution, time extent, etc.). For our example, we will select a small subset (10 x 10 km) north of the city of Muenster, which includes the “Rieselfelder” bird sanctuary. We collect the spatio-temporal extent in a list, which we then pass to cube_view(). We also specify a spatial resolution of 10m and a temporal resolution of 1 day. While the Sentinel-2 satellites have a revisit time of ~5 days (and in addition we have limited the number of items received in our STAC request to 5 in this example), swath overlaps may in principal lead to a higher temporal resolution for certain pixels, so we define 1 day here to get as much data from the image collection as we can, leading to a rather sparse cube. However, days and locations without images are interpreted as no-data values and will simply be ignored in the following steps (see https://gdalcubes.github.io/source/concepts/execution.html#chunking for details).
Even for the same day, swath overlaps between Sentinel-2 A and B may lead to certain locations being recorded twice. Thus, we always have to define an aggregation method. Here we use “first”: in the unlikely case that we have data from two images per day at a certain location, we simply choose the first.
#Collect spatio-temporal extent in a list
# We select a coordinate north of Muenster, and define a 10km x 10km extent around it
rieselX = 404500.13
rieselY = 5765279
ext = list()
ext$left = rieselX - 5000
ext$right = rieselX + 5000
ext$bottom = rieselY - 5000
ext$top = rieselY + 5000
ext$t0 = "2025-06-01"
ext$t1 = "2025-06-30"
# Create cube view with 10m spatial and 1 day temporal resolution, and the previously defined spatio-temporal extent
v = cube_view(srs="EPSG:32632", dx=10, dy=10, dt="P1D",
aggregation="first",
extent=ext)
vA data cube view object
Dimensions:
low high count pixel_size
t 2025-06-01 2025-06-30 30 P1D
y 5760279 5770279 1000 10
x 399500.13 409500.13 1000 10
SRS: "EPSG:32632"
Temporal aggregation method: "first"
Spatial resampling method: "near"
As you can see, the cube view summarizes the spatio-temporal geometry of the raster cube we want to create. We can see (amongst other information) that the resulting cube will have 1000 x 1000 pixels in space (10 km x 10 km at 10m resolution) and 30 time steps (1 per day for June 2025), though many of these pixels and time steps will be no-data values as we do not have complete daily coverage for the whole area.
Create cube and plot an RGB and an NDVI image
Now we´ll create a raster cube using the previously defined cube view. As an example application for such a raster cube, we´ll first plot an RGB image of the area. We´ll then select the red and near-infrared bands, and compute an NDVI. In both cases, we use our raster cube to aggregate the results for one month computing the median. This means, that for all locations where we received data on more than one day in that month, the median of all values of that pixel throughout the month is taken –> a very basic, but easy and straightforward way to select a value that is robust against outliers (e.g. clouds).
gdalcubes_options(parallel=4)
#create an RGB plot of the area, aggregated to one image for the whole month using median
raster_cube(s2_collection, v) |>
select_bands(c("B02_10m","B03_10m","B04_10m")) |>
reduce_time(c("median(B02_10m)", "median(B03_10m)", "median(B04_10m)")) |>
plot(rgb = 3:1, zlim = c(0,3000))#compute NDVI and aggregate to one image using median
ndvi <- raster_cube(s2_collection, v) |>
select_bands(c("B04_10m","B08_10m")) |>
apply_pixel("(B08_10m - B04_10m)/(B08_10m + B04_10m)", "NDVI") |>
reduce_time(c("median(NDVI)"))
plot(ndvi, zlim = c(-0.2,1),key.pos=1, col=viridis::viridis)Perform a simple change detection between April and June 2025.
gdalcubes can also be used to compute more complex analyses, such as change analysis between two time periods. However, for a task like this it is more straightforward to create two separate cubes for both montht, derives cloud-free composite NDVI images, and then use another package like terra to compute the difference image. We´ll apply the same approach as above to create an NDVI image for the same area in April. We´ll export both NDVI images (April and June) and use the package terra to compute a simple difference image highlighting changes between the two months.
library(terra)terra 1.8.60
Attaching package: 'terra'
The following objects are masked from 'package:gdalcubes':
animate, crop, size
items2 = stac_source |>
stac_search(collections = "sentinel-2-l2a",
datetime = "2025-04-01T00:00:00Z/2025-04-30T23:59:59Z",
bbox = c(bbox_wgs84["xmin"],bbox_wgs84["ymin"],
bbox_wgs84["xmax"],bbox_wgs84["ymax"]), limit = 5) |>
post_request()
s2_collection2 = stac_image_collection(items2$features, srs = "EPSG:32632", asset_names = c("B02_10m", "B03_10m", "B04_10m", "B08_10m"), url_fun = f, property_filter = function(x) {x[["proj:code"]] == "EPSG:32632"})
ext$t0 = "2025-04-01"
ext$t1 = "2025-04-30"
v2 = cube_view(srs="EPSG:32632", dx=10, dy=10, dt="P1D",
aggregation="first",
extent=ext)
ndvi2 <- raster_cube(s2_collection2, v2) |>
select_bands(c("B04_10m","B08_10m")) |>
apply_pixel("(B08_10m - B04_10m)/(B08_10m + B04_10m)", "NDVI") |>
reduce_time(c("median(NDVI)"))
ndvi_april <- rast(write_tif(ndvi2))
ndvi_june <- rast(write_tif(ndvi))
ndvi_change <- ndvi_june - ndvi_april
plot(ndvi_change, main = "NDVI Change April 2025 - June 2025")#system.time(raster_cube(s2_collection4, v4) |>
# select_bands(c("B04_10m","B08_10m")) |>
# reduce_time(c("median(B04_10m)","median(B08_10m)")) |>
# apply_pixel("(B08_10m_median - B04_10m_median)/(B08_10m_median + B04_10m_median)", "NDVI") |>
# plot(zlim = c(-0.2,1),key.pos=1, col=viridis::viridis))